Food Label Analysis

library(tidyverse)
library(reactable)
library(htmltools)

Introduction

Food labels can be confusing and hard to read, with the small numbers and text in the Nutrition Label Information table. Additionally ingredients lists can be long, and also written in small text. This information can be difficult to process on the go and without a reference point.

As a side project, I am creating a website is to visualise food labeling data from food packaging with JavaScript libraries Node.js and D3, React and possibly using a MongoDB database.

This data analysis helps to understand the data behind the web application using visual interactive tables.

TL;DR jump to the Data Visualisation.

Data

Food and nutrition data is available from the Ministry of Health.

Plant & Food Research and the Ministry of Health jointly own the New Zealand Food Composition Database. This database source provides a comprehensive collection of nutrition information panel data as seen on food managing.

The FOODfiles™ Data is available subject to the FOODfiles™ Data Licensing terms.

I tried to download various files from the website but the easiest data file to use for this analysis is the Standard DATA.AP which contains data in a table format.

Ideally information panel data would be available in csv format available as a link in the foodcomposition website for direct import for better reproduceability.

standard <- readxl::read_xlsx("Standard DATA.AP.xlsx",skip = 1)

Data Cleaning

Let’s extract the nutrient columns related to the nutrition information panels.

standard_nip <- standard %>% 
  select(`Food Name`,Chapter,`Energy, total metabolisable, carbohydrate by difference, FSANZ (kJ)`,`Protein, total; calculated from total nitrogen`,`Fat, total`,`Fatty acids, total saturated`,`Sugars, total`,`Fibre, total dietary`,Sodium) %>% 
  slice(-1) %>% 
  mutate_at(vars(3:9), as.numeric)

We can extract the units of these nutrients.

units <- standard %>% 
  select(`Food Name`,Chapter,`Energy, total metabolisable, carbohydrate by difference, FSANZ (kJ)`,`Protein, total; calculated from total nitrogen`,`Fat, total`,`Fatty acids, total saturated`,`Sugars, total`,`Fibre, total dietary`,Sodium) %>% 
  slice(1)

We can rename the columns with a snake case naming convention.

names(standard_nip) <- c("food_name","chapter","energy","protein","fat_total","fat_saturated","sugars","fibre","sodium")
names(units) <- c("food_name","chapter","energy","protein","fat_total","fat_saturated","sugars","fibre","sodium")

Exploratory Data Analysis

There are 2768 rows and 89 columns.

Now take a look at summary statistics with the skimr R package of the standard_nip dataset.

standard_nip %>% 
  skimr::skim()
(#tab:skim standard_nip)Data summary
Name Piped data
Number of rows 2767
Number of columns 9
_______________________
Column type frequency:
character 2
numeric 7
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
food_name 0 1 4 172 0 2767 0
chapter 0 1 1 1 0 22 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
energy 0 1 868.64 712.83 0 299.33 683.43 1269.20 3700.00 ▇▅▂▁▁
protein 0 1 9.62 9.63 0 1.38 6.11 16.69 84.36 ▇▂▁▁▁
fat_total 0 1 9.97 16.69 0 0.47 3.26 12.19 100.00 ▇▁▁▁▁
fat_saturated 0 1 3.59 6.99 0 0.08 0.87 4.10 94.01 ▇▁▁▁▁
sugars 0 1 7.68 14.82 0 0.00 1.80 7.60 100.70 ▇▁▁▁▁
fibre 0 1 2.38 4.95 0 0.00 0.90 2.75 70.10 ▇▁▁▁▁
sodium 0 1 322.65 1566.46 0 9.29 65.00 340.00 38700.00 ▇▁▁▁▁

Now the units:

units %>% 
  glimpse()
## Observations: 1
## Variables: 9
## $ food_name     <chr> "Food Name"
## $ chapter       <chr> "Chapter"
## $ energy        <chr> "kJ/100g"
## $ protein       <chr> "g/100g"
## $ fat_total     <chr> "g/100g"
## $ fat_saturated <chr> "g/100g"
## $ sugars        <chr> "g/100g"
## $ fibre         <chr> "g/100g"
## $ sodium        <chr> "mg/100g"

Since they have different units, the g/100g nutrients could be compared as a group, whereas the other units would be compared individually.

In order to check that we have extracted the same nutrition information panel data as in the labels, compare the Butter, unsalted to the following:

standard_nip %>% 
  filter(str_detect(food_name,"Butter, unsalted"))
## # A tibble: 1 x 9
##   food_name   chapter energy protein fat_total fat_saturated sugars fibre sodium
##   <chr>       <chr>    <dbl>   <dbl>     <dbl>         <dbl>  <dbl> <dbl>  <dbl>
## 1 Butter, un~ F        3110.    0.32      83.6          54.1   0.54     0    6.9

What is the food with most energy?

standard_nip %>% 
  slice(which.max(energy)) %>% 
  select(food_name,energy) %>% 
  reactable()

What is the food with most protein?

standard_nip %>% 
  slice(which.max(protein)) %>% 
  select(food_name,protein) %>% 
  reactable()

What is the food with most saturated fat ( the measure of total fat brings back oils with 100g of 100g total fat)?

standard_nip %>% 
  slice(which.max(fat_saturated)) %>% 
  select(food_name,fat_saturated) %>% 
  reactable()

What is the food with the most sugar?

standard_nip %>% 
  slice(which.max(sugars)) %>% 
  select(food_name,sugars) %>% 
  reactable()

What is the saltiest food?

standard_nip %>% 
  slice(which.max(sodium)) %>% 
  select(food_name,sodium) %>% 
  reactable()

Take a look at the miscellaneous food group, which includes herbs and condiments. What food items have the most energy, protein, saturated fat, sugars, fibre and sodium?

We can create a function unquoting the column names as a step with rlang R package and tidyeval to get extract the top 5 foods by nutrient.

top5 <- function(nutrient) {
  require("dplyr")
  output <- standard_nip %>% 
  filter(chapter=="P") %>% 
  arrange(desc(!! rlang::sym(nutrient))) %>% 
  select(food_name,!!nutrient) %>% 
  slice(1:5)  
  return(output)
}

top5(nutrient= "energy") %>% reactable()
top5(nutrient= "protein") %>% reactable()
top5(nutrient= "fat_saturated") %>% reactable()
top5(nutrient= "sugars") %>% reactable()
top5(nutrient= "fibre") %>% reactable()
top5(nutrient= "sodium") %>% reactable()

Data Visualisation

Now create HTML bar charts with the reactable and htmltools R packages.

I chose this fun colour palette to distinguish the units.

# Set global theme
options(reactable.theme = reactableTheme(
style = list(fontFamily = "-apple-system, BlinkMacSystemFont, Segoe UI, Helvetica, Arial, sans-serif"),
  color = "hsl(233, 9%, 87%)",
  backgroundColor = "hsl(233, 9%, 19%)",
  borderColor = "hsl(233, 9%, 22%)",
  stripedColor = "hsl(233, 12%, 22%)",
  highlightColor = "hsl(233, 12%, 24%)",
  inputStyle = list(backgroundColor = "hsl(233, 9%, 25%)"),
  selectStyle = list(backgroundColor = "hsl(233, 9%, 25%)"),
  pageButtonHoverStyle = list(backgroundColor = "hsl(233, 9%, 25%)"),
  pageButtonActiveStyle = list(backgroundColor = "hsl(233, 9%, 28%)")
))
# Render a bar chart with a label on the left
bar_chart <- function(label, width = "100%", height = "16px", fill = "#00bfc4", background = NULL) {
  bar <- div(style = list(background = fill, width = width, height = height))
  chart <- div(style = list(flexGrow = 1, marginLeft = "8px", background = background), bar)
  div(style = list(display = "flex", alignItems = "center"), label, chart)
}

reactable(standard_nip %>% select(-chapter), 
          columns = list(
  food_name = colDef(name = "Food Name", align = "left"),
  energy = colDef(name = "Energy (kJ/100g)", align = "left", cell = function(value) {
    width <- paste0(value / max(standard_nip$energy) * 100, "%")
    bar_chart(round(value,0), width = width,fill = "#E3A8CB", background = "#999999")
  }),
  protein = colDef(name = "Protein (g/100g)", align = "left", cell = function(value) {
    width <- paste0(value / max(standard_nip$protein) * 100, "%")
    bar_chart(round(value,0), width = width,fill = "#E98E10", background = "#999999")
  }),
  fat_total = colDef(name = "Fat Total (g/100g)", align = "left", cell = function(value) {
    width <- paste0(value / max(standard_nip$fat_total) * 100, "%")
    bar_chart(round(value,0), width = width, fill = "#E98E10", background = "#999999")
  }),
  fat_saturated = colDef(name = "Saturated Fat (g/100g)", align = "left", cell = function(value) {
    width <- paste0(value / max(standard_nip$fat_saturated) * 100, "%")
    bar_chart(round(value,0), width = width, fill = "#E98E10", background = "#999999")
  }),
  sugars = colDef(name = "Sugars (g/100g)", align = "left", cell = function(value) {
    width <- paste0(value / max(standard_nip$sugars) * 100, "%")
    bar_chart(round(value,0), width = width, fill = "#E98E10", background = "#999999")
  }),
  fibre = colDef(name = "Fibre (g/100g)", align = "left", cell = function(value) {
    width <- paste0(value / max(standard_nip$fibre) * 100, "%")
    bar_chart(round(value,0), width = width, fill = "#E98E10", background = "#999999")
  }),
  sodium = colDef(name = "Sodium (mg/100g)", align = "left", cell = function(value) {
    width <- paste0(value / max(standard_nip$sodium) * 100, "%")
    bar_chart(round(value,0), width = width, fill = "#A2DC84", background = "#999999")
  })
),  
  filterable = TRUE,
  showPageSizeOptions = TRUE,
  striped = TRUE,
  highlight = TRUE)

Conclusion

This reactable is a great interactive tool to summarise and explore the nutrition information panel data. It is possible to sort and filter, and also view the value of the nutrient relative to the range of the nutrient values across all foods.